HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud
نویسندگان
چکیده
Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use postprocessing deduplication running in system idle time to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services or applications for the following two reasons: Firstly, the temporal locality of duplicate data writes may not exist in some primary storage workloads thus inline caching often fails to achieve good deduplication ratio. Secondly, the post-processing deduplication allows duplicate data to be written into disks, therefore does not provide the benefit of I/O deduplication and requires high peak storage capacity. This paper presents HPDedup, a Hybrid Prioritized data Deduplication mechanism to deal with the storage system shared by applications running in co-located virtual machines or containers by fusing an inline and a post-processing process for exact deduplication. In the inline deduplication phase, HPDedup gives a fingerprint caching mechanism that estimates the temporal locality of duplicates in data streams from different VMs or applications and prioritizes the cache allocation for these streams based on the estimation. HPDedup also allows different deduplication threshold for streams based on their spatial locality to reduce the disk fragmentation. The post-processing phase removes duplicates whose fingerprints are not able to be cached due to weak temporal locality from disks. The hybrid deduplication mechanism significantly reduces the amount of redundant data written to the storage system while maintaining inline data writing performance. Our experimental results show that HPDedup clearly outperforms the state-of-the-art primary storage deduplication techniques in terms of inline cache efficiency and primary deduplication efficiency. Keywords-Data Deduplication; Cache Management; Primary Storage; Cloud Service
منابع مشابه
A Review - Secured Approach for Hybrid Cloud De-Duplication
A hybrid cloud is a combination of public and private clouds bound together by either standardized or proprietary technology that enables data and application portability. Proposed system aiming to efficiently solving the problem of deduplication with differential privileges in cloud computing. A hybrid cloud architecture consisting of a public cloud and a private cloud and the data owners only...
متن کاملData Deduplication: A Technique for Efficient Storage in Cloud
Cloud Computing allows users to outsource storage and computation to servers using internet. In this paper we propose a data Deduplication technique to reduce the amount of storage space and to save bandwidth. The convergent encryption technique has been proposed to protect the confidentiality of sensitive data. Our scheme has the feature of access which allows only valid users to share the dat...
متن کاملDeduplication in Hybrid Cloud with Secure Data
Deduplication is also called single instance technique, deduplication remove redundant data and stores original copy of data so it will saves the storage space to protect sensitive data. The data security and access to particular data is very much important in current days hence the features in deduplication have been widely used in cloud storage system. There was drawback in previous work wher...
متن کاملA Survey On: Secure Data Deduplication on Hybrid Cloud Storage Architecture
Data deduplication is one of the most important Data compression techniques used for to removing the duplicate copies of repeating data and it is widely used in the cloud storage for the purpose of reduce the storage space and save bandwidth. To keep the confidentiality of sensitive data while supporting the deduplication, to encrypt the data before outsourcing convergent encryption technique h...
متن کاملAn Enhanced Multi-Layered Cryptosystem Based Secure and Authorized Deduplication Model in Cloud Storage System
Data deduplication is an important technique for eliminating redundant data. Instead of taking no. of same files, it store only single copy of file. In most organizations, storage system contains many pieces of duplicate data. For example, the same file may be saved in several different places by different users. Deduplication eliminates these extra copies by saving just one copy of the data an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1702.08153 شماره
صفحات -
تاریخ انتشار 2017